Access public data using R in Oracle

Access public data using R in Oracle (AFSC only)

Access data via Oracle

If the user has access to the AFSC Oracle database, the user can use SQL developer to view and pull the FOSS public data directly from the RACEBASE_FOSS Oracle schema.

Connect to Oracle from R

Many users will want to access the data from Oracle using R. The user will need to install the RODBC R package and ask OFIS (IT) connect R to Oracle. Then, use the following code in R to establish a connection from R to Oracle:

Here, the user can write in their username and password directly into the RODBC connect function. Never save usernames or passwords in scripts that may be intentionally or unintentionally shared with others. If no username and password is entered in the function, pop-ups will appear on the screen asking for the username and password.

 #' Define RODBC connection to ORACLE
 #'
 #' @param schema default = 'AFSC'. 
 #'
 #' @return oracle channel connection
 #' @export
 #'
 #' @examples
 #' # Not run
 #' # channel <- oracle_connect()
oracle_connect <- function(
    schema='AFSC', 
    username = NULL, 
    passowrd = NULL){(echo=FALSE)
  
  library("RODBC")
  library("getPass")
  if (is.null(username)) {
    username <- getPass(msg = "Enter your ORACLE Username: ")
  }
  if (is.null(password)) {
    password <- getPass(msg = "Enter your ORACLE Password: ")
  }
  channel  <- RODBC::odbcConnect(
    paste(schema),
    paste(username),
    paste(password), 
    believeNRows=FALSE)
  return(channel)
}

channel <- oracle_connect()

Ex. 1: Select all data

Once connected, pull and save (if needed) the table into the R environment.

# Pull table from oracle into R environment
a <- RODBC::sqlQuery(channel, "SELECT * FROM RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED")
# Save table to local directory
write.csv(x = a, file = "RACEBASE_FOSS-FOSS_CPUE_ZEROFILLED.csv")

Ex. 2: Subset data

To pull a small subset of the data (especially since files like RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED are so big), use a variation of the following code. Here, we are pulling EBS Pacific cod from 2010 - 2021:

 # Pull data
a <- RODBC::sqlQuery(channel, "SELECT * FROM RACEBASE_FOSS.FOSS_CPUE_ZEROFILLED 
WHERE SRVY = 'EBS' 
AND COMMON_NAME = 'Pacific cod' 
AND YEAR >= 2010 
AND YEAR < 2021")
 # Save table to local directory
write.csv(x = a, file = "RACEBASE_FOSS-FOSS_CPUE_ZEROFILLED-ebs_pcod_2010-2020.csv")

Join catch and haul data

If this file is too large, you can join catch and haul data. Pull the RACEBASE_FOSS.JOIN_FOSS_CPUE_CATCH and RACEBASE_FOSS.JOIN_FOSS_CPUE_HAUL tables, which are much smaller, and combine the table locally yourself.

These datasets, JOIN_FOSS_CPUE_CATCH and JOIN_FOSS_CPUE_HAUL, when full joined by the HAULJOIN variable, includes zero-filled (presence and absence) observations and catch-per-unit-effort (CPUE) estimates for all identified species at for index stations by the Resource Assessment and Conservation Engineering Division (RACE) Groundfish Assessment Program (GAP) of the Alaska Fisheries Science Center (AFSC). There are no legal restrictions on access to the data. The data from this dataset are shared on the Fisheries One Stop Stop (FOSS) platform (https://www.fisheries.noaa.gov/foss/). The GitHub repository for the scripts that created this code can be found at https://github.com/afsc-gap-products/gap_public_data. For more information about codes used in the tables, please refer to the survey code books (https://www.fisheries.noaa.gov/resource/document/groundfish-survey-species-code-manual-and-data-codes-manual). These data were last updated June 01, 2023.

RACEBASE_FOSS.JOIN_FOSS_CPUE_CATCH: 
  rows: 36342299
  cols: 11
  2.627 GB
RACEBASE_FOSS.JOIN_FOSS_CPUE_HAUL: 
  rows: 31608
  cols: 25
  0.007 GB

To join these tables in Oracle, you may use a variant of the following code:

SELECT * FROM RACEBASE_FOSS.JOIN_FOSS_CPUE_HAUL
FULL JOIN RACEBASE_FOSS.JOIN_FOSS_CPUE_CATCH
ON RACEBASE_FOSS.JOIN_FOSS_CPUE_HAUL.HAULJOIN = RACEBASE_FOSS.JOIN_FOSS_CPUE_CATCH.HAULJOIN;
footer.knit

NOAA README

This repository is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA GitHub project code is provided on an ‘as is’ basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.

NOAA License

Software code created by U.S. Government employees is not subject to copyright in the United States (17 U.S.C. §105). The United States/Department of Commerce reserve all rights to seek and obtain copyright protection in countries other than the United States for Software authored in its entirety by the Department of Commerce. To this end, the Department of Commerce hereby grants to Recipient a royalty-free, nonexclusive license to use, copy, and create derivative works of the Software outside of the United States.

NOAA Fisheries

U.S. Department of Commerce | National Oceanographic and Atmospheric Administration | NOAA Fisheries